Use of the PsycheMERGE Network to Investigate the Association Between Depression Polygenic Scores and White Blood Cell Count

Key Points Question Are genes that increase predisposition to depression associated with increased inflammatory biomarkers, specifically white blood cell count? Findings In this genetic association study of 382 485 participants, an association was noted between depression polygenic scores and white blood cell count across 4 independent biobanks. Mediation analyses suggest a bidirectional association between white blood cell count and depression diagnosis and implicate neutrophils as the main driver of the association. Meaning These findings suggest that genes associated with depression (rather than only the clinical presentation of depressive symptoms) may be implicated in the proinflammatory state observed in clinical depression; this outcome may motivate future development of targeted biomarker panels and treatments.

This supplementary material has been provided by the authors to give readers additional information about their work.

Icahn School of Medicine at Mount Sinai
The BioMe Biobank, at the Icahn School of Medicine at Mount Sinai, is an EHR-linked biobank of participants from the Mount Sinai Health System in New York, NY. Participant recruitment into BioMe has been ongoing since 2007, predominantly recruited from general medicine and primary care clinics, and the rest from specialty practices and recruitment events. BioMe participants consent to provide DNA and plasma samples linked to their de-identified EHRs, and then provide additional information on self-reported ancestry, health behaviors, and medical history through questionnaires administered upon enrollment.

Massachusetts General Brigham Biobank (MGBB)
The Massachusetts General Brigham Biobank, formerly known as the Partners Healthcare Biobank, is an ongoing virtual cohort study of patients across the MGB General Brigham hospital system (including Brigham and Women's Hospital, Massachusetts General Hospital, and other affiliated hospitals), which provides a large-scale resource of linked longitudinal electronic health records (EHR) data, genomic data, and self-reported survey data 3 . All patients provided informed consent before enrollment, and all study procedures were approved by the Massachusetts General Brigham Institutional Review Board.

Lab Quality Control
Labs were required to have at least 70% of observations in a single set of units and filtered for at least 1,000 observations over at least 100 individuals. Observations outside 4 standard deviations of the sample mean were excluded to remove extreme values including those that are biologically implausible. The median observation for each individual in each lab was selected and adjusted for cubic splines of age at measurement. The age-adjusted value was normalized using a rank-based inverse normal transformation (INT) to ensure a normal distribution for downstream analyses, generating age-adjusted INT lab values. For genetic analysis, labs exhibiting no measurable heritability through the GREML analysis in the GCTA software 4 were excluded, leaving 315 labs for use.

Phecode Defintions
Phecodes are used in phenome-wide association scans (PheWAS). In a PheWAS, ICD codes are hierarchically grouped together into 'phecodes' based on phenotypic similarity. In LabWAS sensitivity analyses, we controlled for depression diagnosis, anxiety diagnosis, and adjustment reaction disorder, defined as phecode 296.2, 300.1, and 304, respectively. For all analyses involving phecodes, cases were required to have at least two instances of component ICD codes and controls were required to have zero component ICD codes and zero phecode exclusion codes defined by the phecode map in the PheWAS R package v0.99 5 . Individuals with only one component ICD code were excluded.
In PheWAS analyses conducted during sensitivity analyses in VUMC, phecodes were required to have at least 100 cases to be included in the scans.

Depression PGS and WBC Mediation Analysis
While mediation analysis can be easily performed with continuous exposures (in this case the MDD-PGS), the calculation of the "proportion of variance mediated" cannot be interpreted on a continuous scale. Instead, we have to specify two discrete levels of the exposure in order to make the contrast (i.e., average MDD-PGS and high MDD-PGS). Therefore, the reference level (average MDD-PGS) and the comparison level (high MDD-PGS) must be defined by two distinct levels of the exposure variable. We selected individuals in the 50 th percentile to represent the average MDD-PGS and tested three different comparison levels including individuals at the 85 th , 90 th , and 95 th percentiles. There was no meaningful difference in the proportion mediated between the three comparison levels, thus we chose the 90 th percentile as representative of the "high MDD-PGS" in the main table and have provided all results in the supplementary table.

Depression PGS and WBC-differential Mediation Analysis
In a multiple mediator analysis, a single main mediator and additional alternative mediators are specified. A structural equation modeling approach is used to assess the effect of the main mediator between the exposure and outcome after controlling for the correlation structure between the alternative mediators and the outcome 6 . All measurements were required to be recorded on the same date for each individual to ensure they were from the same WBCdifferential (N=24,383). For individuals with multiple WBC-differentials recorded in their EHR, median WBC values and the corresponding subtype absolute values were selected. All measurements were adjusted for cubic splines of age at observation and normalized 8,9 .

Controlling for the impact of WBC genetics
We first tested the correlation between WBC PGS and depression diagnosis. The WBC PGS were constructed using PRS-CS-auto with the 1000 Genomes Project Phase 3 European subset reference panel and weights from the UK Biobank WBC summary statistics 10 . WBC PGS were z-score scaled so that the effect estimate is per standard deviation increase in PGS. We first confirmed that the WBC PGS was significantly associated with measured WBC (p < 2.23 x 10 -308 ; beta = 0.14). Then we tested the association between WBC PGS and depression diagnosis (defined as phecode 296.2) across all biobanks, controlling for sex, median age across the medical record, and top 10 genetic principal components using a linear regression model.
In a separate series of analyses, the influence of WBC genetic factors within the depression PGS was examined. First, the effect of genetic regulation of WBC was adjusted from the depression summary statistics by conditioning the depression summary statistics on the WBC summary statistics using multi-trait-based conditional & joint analysis 11 (mtCOJO) in GCTA version1.91.4. Next, conditioned depression PGS were constructed using the conditioned depression summary statistics. Finally, we tested the association between the conditioned depression PGS and the median age-adjusted INT normalized WBC measurements, controlling for sex and top 10 principal components of ancestry. The effect estimates of each analysis from all four sites were meta-analyzed using a fixed-effect inverse variance weighted model in the meta 7

Controlling for the impact of WBC genetics
The LabWAS and phe-group conditional analyses confirmed a correlation between increased depression PGS and elevated WBC that is unlikely to be due to an underlying comorbid confounder. However, there remain multiple possible hypotheses to explain these results. First, increased depression PGS may lead to elevated WBC through mechanisms that directly (e.g., hematopoiesis) or indirectly (e.g., cortisol) activate the immune system. However, given that WBC itself is a highly heritable trait (h 2 = 14 -40% 12 ), it is also possible that the GWAS used to train the depression PGS includes WBC genetic associations due to either phenotypic hitchhiking in the original ascertainment of depression, or a real contribution from the heritable component of WBC acting as an independent risk factor for depression. In either circumstance, this would result in depression cases having higher WBC PGS than controls, which would also be reflected in the depression PGS and the subsequent genetic correlation between WBC and depression in independent samples.
To address these complicating issues, we undertook a series of analyses. First, we calculated the genetic correlation between depression and WBC, which was non-significant (rg = 0.027, p-value = 0.268). Next, the weights for the depression PGS were conditioned on the weights from the WBC GWAS using mtCOJO to adjust for the effects of WBC on depression genetic risk. A conditional depression PGS built from the conditioned weights (MDD|WBC PGS) remained associated with WBC (p-value=3.60 x 10 -108 , beta=0.035), confirming the majority of the association between depression PGS and measured WBC arises not from the heritable component of WBC, but from the impact of genes that increase risk for depression (Supplementary Figure 4, Supplementary Table 9).
Next, we tested the association between WBC PGS and depression diagnosis in each biobank. The meta-analysis across the four sites suggested a small though significant association between WBC PGS and depression (p-value = 3.52 x 10 -5 , beta=0.015), but this association was only observed in the MVP which contributed the largest sample size to the meta-analysis. In VUMC, depression status was not significantly associated with WBC levels (p-value = 0.053, beta = 0.009, SE=0.01). eFigure 1. LabWAS of Depression PGS in VUMC controlled for a) sex and top 10 principal components of ancestry, b) depression diagnosis, c) depression and anxiety diagnoses, d) depression, anxiety, and adjustment reaction, e) diagnoses for depression, anxiety, adjustment reaction, and median BMI, f) diagnoses for depression, anxiety, adjustment reaction, tobacco use disorder and median BMI, and g) diagnoses for depression, anxiety, adjustment reaction, median BMI, and smoking ever documented in the EHR. Depression, anxiety, adjustment reaction, and tobacco use disorder diagnoses were defined as phecodes 296.2, 300.1, 304, and 318, respectively. The red line indicates the Bonferroni threshold for statistical significance (p<1.58 x 10-4) and the blue line indicates a p-value of 0.05. Upward triangles indicate that the PGS is associated with increased levels of the lab, while downward triangles indicate an association with reduced levels of the lab.

eTable 10. Association Between Depression PGS and WBC Levels Controlled for Common Phenotype Groups in VUMC
The association was controlled for each phenotype group separately and all groups in one analysis in the "All" phenotype. Associations were found using a linear regression controlling for sex and top 10 principal components of ancestry.